EXP_SURVIVAL_REG
Overview
The EXP_SURVIVAL_REG function fits a parametric exponential survival regression model to analyze time-to-event data with covariates. This approach is commonly used in survival analysis to study the relationship between predictor variables and the time until an event occurs, such as equipment failure, patient recovery, or customer churn.
The exponential survival model assumes a constant hazard rate over time, making it the simplest parametric survival distribution. This memoryless property means that the probability of an event occurring in the next time interval is independent of how much time has already elapsed. The exponential distribution is characterized by a single rate parameter \lambda, with the survival function:
S(t) = e^{-\lambda t}
and hazard function h(t) = \lambda (constant).
This implementation leverages statsmodels Generalized Linear Models (GLM) framework with a Poisson family and log link. By using the logarithm of survival time as an offset term, the GLM-Poisson approach provides an efficient method for fitting exponential survival regression. The model estimates log-hazard coefficients:
\log(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}
where \lambda_i is the hazard rate for observation i and x_{ij} are the covariate values.
The function handles right-censored data, where some observations have not yet experienced the event by the end of the study (indicated by event = 0). Output includes coefficient estimates, standard errors, z-statistics, p-values, confidence intervals at the specified significance level, and hazard ratios (exponentiated coefficients) which represent the multiplicative change in hazard for a one-unit increase in the covariate. Model fit statistics including log-likelihood, AIC, and BIC are also provided.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=EXP_SURVIVAL_REG(time, event, x, fit_intercept, alpha)
time(list[list], required): Time to event or censoring (positive values).event(list[list], required): Event indicator (1 = event occurred, 0 = censored).x(list[list], required): Covariates (predictor variables) matrix.fit_intercept(bool, optional, default: true): If True, includes intercept in the model.alpha(float, optional, default: 0.05): Significance level for confidence intervals (0 to 1).
Returns (list[list]): 2D list with survival regression, or error message string.
Examples
Example 1: Demo case 1
Inputs:
| time | event | x |
|---|---|---|
| 5 | 1 | 1 |
| 8 | 1 | 2 |
| 12 | 0 | 3 |
| 15 | 1 | 2.5 |
| 20 | 1 | 4 |
Excel formula:
=EXP_SURVIVAL_REG({5;8;12;15;20}, {1;1;0;1;1}, {1;2;3;2.5;4})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | hazard_ratio |
|---|---|---|---|---|---|---|---|
| intercept | -1.138 | 1.344 | -0.8471 | 0.3969 | -3.771 | 1.495 | 0.3204 |
| x1 | -0.5937 | 0.5251 | -1.131 | 0.2582 | -1.623 | 0.4354 | 0.5523 |
| log_likelihood | -4.8 | ||||||
| aic | 13.6 | ||||||
| bic | 12.82 | ||||||
| scale_parameter | 1 |
Example 2: Demo case 2
Inputs:
| time | event | x | fit_intercept |
|---|---|---|---|
| 10 | 1 | 2 | false |
| 15 | 0 | 3 | |
| 20 | 1 | 4 | |
| 25 | 1 | 5 |
Excel formula:
=EXP_SURVIVAL_REG({10;15;20;25}, {1;0;1;1}, {2;3;4;5}, FALSE)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | hazard_ratio |
|---|---|---|---|---|---|---|---|
| x1 | -0.8671 | 0.1685 | -5.147 | 2.647e-7 | -1.197 | -0.5369 | 0.4202 |
| log_likelihood | -4.85 | ||||||
| aic | 11.7 | ||||||
| bic | 11.09 | ||||||
| scale_parameter | 1 |
Example 3: Demo case 3
Inputs:
| time | event | x | alpha |
|---|---|---|---|
| 6 | 1 | 1.5 | 0.1 |
| 9 | 1 | 2.2 | |
| 14 | 1 | 3.1 | |
| 18 | 0 | 2.8 | |
| 22 | 1 | 3.9 |
Excel formula:
=EXP_SURVIVAL_REG({6;9;14;18;22}, {1;1;1;0;1}, {1.5;2.2;3.1;2.8;3.9}, 0.1)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | hazard_ratio |
|---|---|---|---|---|---|---|---|
| intercept | -1.163 | 1.793 | -0.6484 | 0.5167 | -4.113 | 1.787 | 0.3126 |
| x1 | -0.5911 | 0.6439 | -0.9181 | 0.3586 | -1.65 | 0.4679 | 0.5537 |
| log_likelihood | -5.258 | ||||||
| aic | 14.52 | ||||||
| bic | 13.73 | ||||||
| scale_parameter | 1 |
Example 4: Demo case 4
Inputs:
| time | event | x | fit_intercept | alpha | |
|---|---|---|---|---|---|
| 7 | 1 | 1 | 0.5 | true | 0.05 |
| 11 | 1 | 2 | 1 | ||
| 16 | 0 | 3 | 1.5 | ||
| 19 | 1 | 2.5 | 1.2 | ||
| 24 | 1 | 4 | 2 |
Excel formula:
=EXP_SURVIVAL_REG({7;11;16;19;24}, {1;1;0;1;1}, {1,0.5;2,1;3,1.5;2.5,1.2;4,2}, TRUE, 0.05)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | hazard_ratio |
|---|---|---|---|---|---|---|---|
| intercept | -1.517 | 1.35 | -1.124 | 0.2612 | -4.163 | 1.129 | 0.2194 |
| x1 | -1.259 | 11.55 | -0.109 | 0.9132 | -23.9 | 21.38 | 0.2838 |
| x2 | 1.434 | 23.16 | 0.06191 | 0.9506 | -43.96 | 46.83 | 4.195 |
| log_likelihood | -4.825 | ||||||
| aic | 15.65 | ||||||
| bic | 14.48 | ||||||
| scale_parameter | 1 |
Python Code
import math
import numpy as np
import statsmodels.api as sm_api
def exp_survival_reg(time, event, x, fit_intercept=True, alpha=0.05):
"""
Fits a parametric exponential survival regression model.
See: https://www.statsmodels.org/stable/glm.html
This example function is provided as-is without any representation of accuracy.
Args:
time (list[list]): Time to event or censoring (positive values).
event (list[list]): Event indicator (1 = event occurred, 0 = censored).
x (list[list]): Covariates (predictor variables) matrix.
fit_intercept (bool, optional): If True, includes intercept in the model. Default is True.
alpha (float, optional): Significance level for confidence intervals (0 to 1). Default is 0.05.
Returns:
list[list]: 2D list with survival regression, or error message string.
"""
def to2d(arr):
return [[arr]] if not isinstance(arr, list) else arr
def validate_numeric(value, name):
if not isinstance(value, (int, float, bool)):
return f"Invalid input: {name} must be a number."
num = float(value)
if math.isnan(num) or math.isinf(num):
return f"Invalid input: {name} must be finite."
return num
def validate_2d_array(arr, name, expected_cols=None):
arr_2d = to2d(arr)
if not isinstance(arr_2d, list):
return f"Invalid input: {name} must be a 2D list."
if len(arr_2d) == 0:
return f"Invalid input: {name} must not be empty."
for i, row in enumerate(arr_2d):
if not isinstance(row, list):
return f"Invalid input: {name} row {i} must be a list."
if expected_cols is not None and len(row) != expected_cols:
return f"Invalid input: {name} must have {expected_cols} columns."
n_cols = len(arr_2d[0])
for i, row in enumerate(arr_2d):
if len(row) != n_cols:
return f"Invalid input: {name} must have consistent column count."
flat_values = []
for row in arr_2d:
for val in row:
validated = validate_numeric(val, f"{name} element")
if isinstance(validated, str):
return validated
flat_values.append(validated)
return np.array(flat_values).reshape(len(arr_2d), n_cols)
# Validate inputs
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
alpha_val = validate_numeric(alpha, "alpha")
if isinstance(alpha_val, str):
return alpha_val
if alpha_val <= 0 or alpha_val >= 1:
return "Invalid input: alpha must be between 0 and 1."
time_arr = validate_2d_array(time, "time", expected_cols=1)
if isinstance(time_arr, str):
return time_arr
event_arr = validate_2d_array(event, "event", expected_cols=1)
if isinstance(event_arr, str):
return event_arr
x_arr = validate_2d_array(x, "x")
if isinstance(x_arr, str):
return x_arr
# Check dimensions match
n_obs = time_arr.shape[0]
if event_arr.shape[0] != n_obs:
return "Invalid input: time and event must have same number of rows."
if x_arr.shape[0] != n_obs:
return "Invalid input: x must have same number of rows as time."
# Validate time values are non-negative and non-zero
time_flat = time_arr.flatten()
if np.any(time_flat <= 0):
return "Invalid input: time values must be positive."
# Validate event values are 0 or 1
event_flat = event_arr.flatten()
if not np.all((event_flat == 0) | (event_flat == 1)):
return "Invalid input: event values must be 0 or 1."
# Check for at least one event
if np.sum(event_flat) == 0:
return "Invalid input: at least one event must occur (event = 1)."
# Prepare data for GLM with Poisson family
# For exponential regression: log hazard = intercept + beta * x
# Using GLM with Poisson family and log(time) as offset
n_covariates = x_arr.shape[1]
# Add intercept if requested
if fit_intercept:
exog = sm_api.add_constant(x_arr, prepend=True)
param_names = ['intercept'] + [f"x{i+1}" for i in range(n_covariates)]
else:
exog = x_arr
param_names = [f"x{i+1}" for i in range(n_covariates)]
try:
# Fit GLM with Poisson family and log(time) as offset
# This is equivalent to exponential survival regression
model = sm_api.GLM(
endog=event_flat,
exog=exog,
family=sm_api.families.Poisson(),
offset=np.log(time_flat)
)
result = model.fit()
except Exception as exc:
return f"statsmodels error: {str(exc)}"
# Extract results
coefficients = result.params
std_errors = result.bse
z_stats = result.tvalues
p_values = result.pvalues
# Get confidence intervals
conf_int = result.conf_int(alpha=alpha_val)
# Calculate hazard ratios (exp of coefficients)
hazard_ratios = np.exp(coefficients)
# Build result table
headers = ['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'hazard_ratio']
result_table = [headers]
# Add parameter rows
for i, param_name in enumerate(param_names):
row = [
param_name,
float(coefficients[i]),
float(std_errors[i]),
float(z_stats[i]),
float(p_values[i]),
float(conf_int[i, 0]),
float(conf_int[i, 1]),
float(hazard_ratios[i])
]
result_table.append(row)
# Add model statistics
log_likelihood = float(result.llf)
aic = float(result.aic)
bic = float(result.bic_llf)
# For exponential model, scale parameter is 1 (constant hazard)
scale_param = 1.0
result_table.append(['log_likelihood', log_likelihood, "", "", "", "", "", ""])
result_table.append(['aic', aic, "", "", "", "", "", ""])
result_table.append(['bic', bic, "", "", "", "", "", ""])
result_table.append(['scale_parameter', scale_param, "", "", "", "", "", ""])
return result_table